Skip to content

Conversation

@robsyme
Copy link
Collaborator

@robsyme robsyme commented Jan 2, 2026

Summary

  • Add aws.batch.forceGlacierTransfer configuration option to enable --force-glacier-transfer flag for AWS CLI S3 download commands
  • Clean up trailing whitespace in S3BashLib

Problem

When Nextflow stages input files from S3 directories containing restored Glacier/Deep Archive objects, AWS CLI skips them even though they are fully accessible. This is because AWS CLI checks the storage class metadata (which remains GLACIER or DEEP_ARCHIVE even after restoration) and refuses to download unless --force-glacier-transfer is specified.

Error message from AWS CLI:

warning: Skipping file s3://bucket/path/file.tif. Object is of storage class GLACIER.
Unable to perform download operations on GLACIER objects.

Solution

Add a new config option:

aws {
    batch {
        forceGlacierTransfer = true
    }
}

When enabled, the nxf_s3_download bash function includes --force-glacier-transfer in aws s3 cp commands, allowing restored Glacier objects to be downloaded.

Test plan

  • Added unit test for new config option in AwsBatchConfigTest
  • Added unit test for glacier transfer flag in S3BashLibTest
  • All existing tests pass

Closes #4747

@netlify
Copy link

netlify bot commented Jan 2, 2026

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 06456df
🔍 Latest deploy log https://app.netlify.com/projects/nextflow-docs-staging/deploys/695e858e29c0660008648c46
😎 Deploy Preview https://deploy-preview-6700--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@robsyme robsyme force-pushed the fix-glacier-force-transfer branch from 1ca5397 to daf9ed7 Compare January 2, 2026 19:05
Clean up trailing spaces in retryEnv(), s3Lib(), and s5cmdLib()
methods, and update corresponding test expectations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Rob Syme <rob.syme@gmail.com>
@robsyme robsyme force-pushed the fix-glacier-force-transfer branch from daf9ed7 to ee43990 Compare January 2, 2026 19:17
@robsyme
Copy link
Collaborator Author

robsyme commented Jan 2, 2026

@claude Review

@claude
Copy link

claude bot commented Jan 2, 2026

Claude finished @robsyme's task —— View job


Code Review Analysis

Reviewing PR #6700: Add aws.batch.forceGlacierTransfer config option

Todo List:

  • Analyze configuration changes in AwsOptions and AwsBatchConfig
  • Review S3BashLib implementation for glacier transfer support
  • Examine test coverage and correctness
  • Check for potential security or performance concerns
  • Verify solution addresses the stated problem
  • Provide comprehensive review feedback

Review Summary

This PR successfully implements the aws.batch.forceGlacierTransfer configuration option to address the AWS Glacier/Deep Archive download issue. The implementation is well-architected and follows Nextflow's established patterns.

✅ Strengths

1. Clean Architecture & Implementation

  • Proper layering: Configuration flows correctly from AwsBatchConfigAwsOptionsS3BashLib
  • Consistent patterns: Follows existing conventions for config options with proper @ConfigOption annotations and descriptions
  • Type safety: Uses boolean type with proper defaults (false)

2. Targeted Solution

  • Download-only: Correctly applies --force-glacier-transfer only to download operations (nxf_s3_download) in S3BashLib.groovy:135,137
  • No upload impact: Upload operations remain unchanged, which is appropriate since this is a retrieval issue
  • s5cmd compatibility: Implementation correctly excludes the flag for s5cmd operations (which handle Glacier differently)

3. Excellent Test Coverage

  • Unit tests: New test in AwsBatchConfigTest.groovy:158-170 validates config parsing
  • Integration tests: New test in S3BashLibTest.groovy:820-856 verifies bash script generation
  • Edge cases: Tests cover both enabled and disabled states
  • Existing tests updated: All affected test files properly updated to maintain compatibility

4. Documentation & Clarity

  • Clear description: Config option documentation clearly explains purpose and default value
  • Descriptive naming: forceGlacierTransfer name clearly indicates functionality

🔍 Technical Details

Config Flow Analysis

The implementation follows a clean data flow:

  1. AwsBatchConfig.groovy:132,160 - Defines and parses the config option
  2. AwsOptions.groovy:167-169 - Exposes the option to executors
  3. S3BashLib.groovy:94-96,203 - Applies the setting to bash script generation

Bash Script Generation

The generated bash includes the flag conditionally:

# When forceGlacierTransfer = true
aws s3 cp --only-show-errors --recursive --force-glacier-transfer "$source" "$target"

# When forceGlacierTransfer = false (default)
aws s3 cp --only-show-errors --recursive "$source" "$target"

⚠️ Minor Observations

1. Documentation Enhancement Opportunity

The config description could mention that this specifically applies to restored objects:

// Current
"When `true`, the `--force-glacier-transfer` flag is added to AWS CLI S3 download commands, allowing restored Glacier/Deep Archive objects to be downloaded (default: `false`)."

// Suggested enhancement
"When `true`, the `--force-glacier-transfer` flag is added to AWS CLI S3 download commands, allowing **restored** Glacier/Deep Archive objects to be downloaded. Only needed when staging files that have been restored from archive storage (default: `false`)."

2. Performance Consideration

While correct, there's a slight performance consideration: the flag affects all downloads when enabled, not just Glacier objects. This is acceptable since the AWS CLI handles this efficiently, but users should be aware this is a global setting per job.

🚀 Validation

Problem Resolution

The implementation directly addresses the stated problem:

  • ✅ Adds --force-glacier-transfer flag to AWS CLI commands
  • ✅ Applies only to download operations where the issue occurs
  • ✅ Provides user control via configuration
  • ✅ Maintains backward compatibility (disabled by default)

Error Message Resolution

This should resolve the reported error:

warning: Skipping file s3://bucket/path/file.tif. Object is of storage class GLACIER.
Unable to perform download operations on GLACIER objects.

📊 Overall Assessment

Status: ✅ APPROVE - Ready for merge

This is a well-implemented solution that:

  • Solves the stated problem effectively
  • Follows project conventions and patterns
  • Includes comprehensive tests
  • Maintains backward compatibility
  • Has minimal risk and clear scope

The implementation demonstrates good understanding of the Nextflow architecture and AWS Batch integration patterns. No blocking issues identified.


Add configuration option to enable --force-glacier-transfer flag for
AWS CLI S3 download commands. This allows downloading restored
Glacier/Deep Archive objects that retain their storage class metadata.

When enabled via:
  aws.batch.forceGlacierTransfer = true

The nxf_s3_download function will include --force-glacier-transfer
in aws s3 cp commands, enabling downloads of restored Glacier objects
without AWS CLI skipping them due to storage class checks.

Fixes: #4747

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Rob Syme <rob.syme@gmail.com>
@robsyme robsyme force-pushed the fix-glacier-force-transfer branch from ee43990 to 8fc1ffc Compare January 2, 2026 19:29
@robsyme
Copy link
Collaborator Author

robsyme commented Jan 2, 2026

In response to the performance consideration raided by @claude:

From what I can see, when enabling the flag for standard storage class objects, the flag is effectively a no-op - the transfer proceeds normally either way. There are:

  • No extra API calls - it doesn't call HeadObject to verify restoration status
  • No extra processing - just bypasses a conditional check
  • No cost difference - same S3 GET requests either way

@claude
Copy link

claude bot commented Jan 2, 2026

Claude finished @robsyme's task —— View job


Claude is reviewing @robsyme's clarification

Todo List:

  • Analyze @robsyme's performance clarification
  • Update review assessment based on new information
  • Provide final review feedback

Updated Code Review Analysis

Thank you for the clarification @robsyme! You're absolutely correct about the performance consideration.

✅ Performance Clarification - Issue Resolved

Your explanation confirms that my minor performance observation was indeed not a concern:

AWS CLI --force-glacier-transfer behavior for standard storage objects:

  • No extra API calls - No HeadObject requests to check restoration status
  • No extra processing - Just bypasses an internal conditional check
  • No cost difference - Same S3 GET requests regardless
  • No-op behavior - Standard storage transfers proceed normally

This means the flag is effectively transparent for non-Glacier objects, making it safe to enable globally when needed without any performance penalty.

📊 Final Assessment

Status: ✅ APPROVE - Ready for merge

The implementation is excellent and my previous "performance consideration" is resolved. This PR:

  • ✅ Solves the Glacier download problem effectively
  • ✅ Has zero performance impact on standard storage objects
  • ✅ Follows project conventions perfectly
  • ✅ Includes comprehensive test coverage
  • ✅ Maintains full backward compatibility

The --force-glacier-transfer flag implementation is both safe and optimal. No concerns remaining.


@blajoie
Copy link

blajoie commented Jan 2, 2026

happy to help test as soon as you can point us to a nextflow dev version ! thanks @robsyme !

@pditommaso pditommaso self-requested a review January 3, 2026 02:34
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman bentsherman requested a review from a team as a code owner January 6, 2026 15:29
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
@bentsherman bentsherman merged commit e3bf315 into master Jan 7, 2026
25 checks passed
@bentsherman bentsherman deleted the fix-glacier-force-transfer branch January 7, 2026 16:20
pditommaso pushed a commit that referenced this pull request Jan 16, 2026
Co-authored-by: Ben Sherman <bentshermann@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Recursively downloading S3 directory restored from Glacier doesn't work

5 participants